Accelerating the Sweep3D for a Graphic Processor Unit

نویسندگان

  • Chunye Gong
  • Jie Liu
  • Haitao Chen
  • Jing Xie
  • Zhenghu Gong
چکیده

As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the finegrained parallel architecture of the GPU. Our results show that the overall performance of Sweep3D on the CPU-GPU hybrid platform can be improved up to 4.38 times as compared to the CPU-based implementation. Keywords—Sweep3D, Neutron Transport, GPU, CUDA

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal

Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of  the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...

متن کامل

Determining the Proper compression Algorithm for Biomedical Signals and Design of an Optimum Graphic System to Display Them (TECHNICAL NOTES)

In this paper the need for employing a data reduction algorithm in using digital graphic systems to display biomedical signals is firstly addressed and then, some such algorithms are compared from different points of view (such as complexity, real time feasibility, etc.). Subsequently, it is concluded that Turning Point algorithm can be a suitable one for real time implementation on a microproc...

متن کامل

Training Recurrent Neural Network Using Multistream Extended Kalman Filter on Multicore Processor and Cuda Enabled Graphic Processor Unit

Recurrent neural networks are popular tools used for modeling time series. Common gradient-based algorithms are frequently used for training recurrent neural networks. On the other side approaches based on the Kalman filtration are considered to be the most appropriate general-purpose training algorithms with respect to the modeling accuracy. Their main drawbacks are high computational requirem...

متن کامل

Accelerating Network Coding on Many-core GPUs and Multi-core CPUs

Network coding has recently been widely applied in various distributed systems for throughput improvement and/or resilience to network dynamics. However, the computational overhead introduced by network coding operations is not negligible and has become the obstacle for practical deployment of network coding. In this paper, we exploit the computing power of commodity many-core Graphic Processin...

متن کامل

A Parallel Approach for Visualization of Relief Textures

With the continuous increase of processing power, the graphic hardware – also called Graphic Processor Unit (GPU) – is naturally assuming most part of the rendering pipeline, leaving the Central Processor Unit (CPU) with more idle time. In order to take advantage of this when rendering relief textures, the present work proposes two approaches for the mapping of relief textures. Both methods are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JIPS

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2011